An Experiment of Use and Reuse of Verb Valency in Morphosyntactic Disambiguation and Machine Translation for Euskara and North Sámi
نویسندگان
چکیده
There are a number of well known resources dealing with verb valency including PropBank (Palmer et al., 2005), VerbNet (Kipper et al., 2006) and VALLEX (Hajič et al., 2003). These include thematic roles, morpho-syntactic specifcations and selection preferences. A comparatively wide definition of valency including subcategorization information on all mentioned linguistic levels is applied here. However, these resources have not often been used in rule-based NLP tasks such as machine translation or disambiguation. Bick (2000) uses syntactic verb valency tags specifying e.g. transitivity and selection preferences for various NLP tasks. The use of verb valency is on a high level of grammatical analysis and requires other elaborated linguistic resources. Bick (2000) uses tags specifying transitivity preferences such as "preferably transitive, but potentially intransitive" but also selection preferences, e.g. specifying a human accusative. Agirre et al. (2009) successfully apply valency information, i.e. case subcategorization information, to the Spanish->Euskara MT system Matxin in order to improve NP/PP translation. They present different kinds of tests enriching their machine translation system with different techniques. In all cases, the combinations of techniques that include valency information produce the best results especially in recall and F-score. This paper describes an experiment for the application of verb valency in Euskara and North Sámi rule-based NLP applications, i.e. morphosyntactic disambiguation and machine translation. 10 frequent verbs each are annotated to improve the analysis, and later the effects on the application is evaluated. The main objective of the experiment is improving linguistic resources for North Sámi and Euskara taking advantage of pre-existing existing resources in one language and transfering them to the other language. Other works (Antonsen et al., 2010) have shown that the reuse of grammatical resources between both related and unrelated endangered languages is possible and provides useful results, especially on a high level of linguistic analysis. In Antonsen et al. (2010) especially the reuse of the dependency grammar is described. A number of problems that syntax alone cannot handle can be resolved by semantically richer information included in verb valency. Verb valency annotation is applied on a high level of linguistic analysis and is therefore useful for reuse even for unrelated languages.
منابع مشابه
Verb sense disambiguation in Machine Translation
We describe experiments in Machine Translation using word sense disambiguation (WSD) information. This work focuses on WSD in verbs, based on two different approaches – verbal patterns based on corpus pattern analysis and verbal word senses from valency frames. We evaluate several options of using verb senses in the source-language sentences as an additional factor for the Moses statistical mac...
متن کاملاستفاده از تجزیه گرهای احتمالاتی زبان طبیعی جهت بهبود ترجمه افعال گروهی انگلیسی به فارسی
Machine translation of English sentences faces a big problem when it deals with phrasal verbs. Phrasal verb is a common structure occurring in English as a combination of a verb and a preposition, a verb and an adverb, or a verb with both an adverb and a preposition. Meaning of a phrasal verb is not compositional. The second part of the phrasal verbs which often is a preposition is called parti...
متن کاملOn Automatic Assignment of Verb Valency Frames in Czech
Many recent NLP applications, including machine translation and information retrieval, could benefit from semantic analysis of language data on the sentence level. This paper presents a method for automatic disambiguation of verb valency frames on Czech data. For each verb occurrence, we extracted features describing its local context. We experimented with diverse types of features, including m...
متن کاملDeveloping Prototypes for Machine Translation between Two Sámi Languages
This paper describes the development of two prototype systems for machine translation between North Sámi and Lule Sámi. Experiments were conducted in rule-based machine translation (RBMT), using the Apertium platform, and statistical machine translation (SMT) using the Mosesdecoder. The experiments show that both approaches have their advantages and disadvantages, and that they can both make us...
متن کاملVerb Valency Frames Disambiguation: Dissertation Summary
is is a summary of the author’s PhD dissertation defended on September 17, 2007 at the Faculty of Mathematics and Physics, Charles University in Prague. Semantic analysis has become a bottleneck of many natural language applications. Machine translation, automatic question answering, dialog management, and others rely on high quality semantic analysis. Verbs are central elements of clauses wit...
متن کامل